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Abstract 

A diffusion process on complex networks is introduced in order to uncover their large 
scale topological structures. This is achieved by focusing on the slowest decaying 
diffusive modes of the network. The proposed procedure is applied to real-world net- 
works like a friendship network of known modular structure, and an Internet routing 
network. For the friendship network, its known structure is well reproduced. In case 
of the Internet, where the structure is far less well-known, one indeed finds a mod- 
ular structure, and modules can roughly be associated with individual countries. 
Quantitatively the modular structure of the Internet manifests itself in an approxi- 
mately 10 times larger participation ratio of its slowest decaying modes as compared 
to the null model - a random scale-free network. The extreme edges of the Internet 
are found to correspond to Russian and US military sites. 
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1 Introduction 



Complex networks are natural structures for representing many real-world sys- 
tems. Their flexible and adaptive features make them ideal for interrelating 
various types of information and to update them as time goes by. Recently, 
there has been tremendous interest, from various fields of science, in the study 
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of the statistical properties of such networks [1,2]. By now, many interesting 
features of complex networks have been established. So far, the main atten- 
tion in the study of complex networks has been on the properties of individual 
nodes and how they connect (link) to their nearest neighbors. However, ex- 
ploring the local properties of the network outside the nearest neighbor level, 
has not been well studied. A noteworthy exception is the recent study by Gir- 
van and Newman [3]. Such studies will naturally have to address the cluster 
(modular) structure of the network. To know if a network is modular or not, 
is important if one tries to assess its robustness and stability, since only a few 
critical links are responsible for the inter-modular communication in clustered 
networks. Hence to make the network more robust, such critical links have to 
be identified and strengthen. 

In this work we will address the large scale topological structures of networks. 
This problem will be approached by considering an auxiliary diffusion process 
on the underlying complex network. As it will turn out, it is the slowest decay- 
ing diffusive eigenmodes that will be of most interest to us, since such modes 
will contain, as we will see, information about the weakly interacting modules 
of the network. Hence, by studying how an ensemble of random walkers slowly 
reaches a state of equilibrium, one can learn something about the topology of 
the network. 

To study spectral properties of networks is not new; Its variants has previously 
been applied to random graphs [4], to social networks (the correspondence 
analysis) [5], random and small- world networks (the Laplace equation analy- 
sis) [6], artificial scale-free networks [7,8], and community structures [3,9]. A 
diffusion approach has also made it to practical applications. For instance, the 
analysis of a diffusion process lies at the heart of the popular search engine 
Google [10]. 



2 A Markovian diffusion process on complex networks 

The original physical motivation behind the method to be outlined below, was 
that the relaxation of some arbitrary initial state (of walkers on the network) 
toward the steady state distribution, was expected to be fast in regions that 
are highly connected, while slow in regions that had low connectivity. By defi- 
nition, a module is highly connected internally, but has only a small number of 
links to nodes outside the module. Hence, it was reasoned, that by identifying 
the slowly decaying eigenmodes of the diffusive network process, one should 
be able to obtain information about the large scale topological structures, like 
modules, of the underlying complex network. We will now formalize this idea 
and outline the method in some detail. 
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Let us start by assuming that we are dealing with a (fully connected) complex 
network consisting of N nodes and characterized by some degree distribution 
p(k) where fcj denotes the connectivity of node i. Imagine now placing a large 
number (>> N) of random walkers onto this network. The fraction of walkers 
on node i (relative the total number) at time t we will denote by Piit). In each 
time step, the walkers are allowed to move randomly between nodes that are 
directly linked to each other. Since there is no way that a walker can vanish 
from the network, the total number of walkers must be conserved globally, 
i.e. one must have J2iPi(t) = 1 at all times. Furthermore, locally a continuity 
equation must be satisfied; If we consider, say node i, that directly links to 
other nodes j, one must require that (the balance equation): 



fH(t+ 1) -m*)=Ea^ -E^hP- (!) 

3 J 3 1 

In writing this equation, we have introduced the so-called adjacency matrix 
Aij = Aji defined to be 1 if node % and j are directly linked to each other, and 
otherwise [1,2], and k h we recall, is the degree of node i, i.e. its number of 
nearest neighbors hi = J2j My The first term on the right hand side of Eq. (1) 
describes the flow of walkers into node i, while the last term is associated with 
the out-flow of walkers from the same node. As will be useful later, Eq. (1) 
can be casted into the following equivalent matrix form: 



d t p(t)=Bp(t), (2) 

where d t p(t) = p(t + 1) — pit), and D is the diffusion matrix to be defined 
below. Eq. (2) should be compared to the continuous diffusion equation for 
the particle density: d t p(r,t) = D'V 2 p(r,t), where D is the diffusion con- 
stant. A complex network obviously has an inherent discreetness, and hence 
no continuous limit can be taken. However, if the continuous diffusion equa- 
tion is understood in its discreet form, one is lead to regard Eq. (2) as the 
master equation for the random-walk process taking place on the underly- 
ing network. By comparing Eqs. (1) and (2), as well as taking advantage of 
J2j AjiPi(t)/ki = pi(t) (because J2i Mj — kj), one is lead to = A^/kj — 5ij. 
Hence, an equivalent formulation of Eq. (2), is 



p(t+l) = Tp(t), (3) 

where T is the transfer matrix, related to the diffusion matrix by T = D + l. In 
component form one thus has T^- = Aij/kj. Since the transfer matrix, "trans- 
fers" the walker distribution one time step ahead, it can therefore be thought 
of as a time-propagator for the process. Formally the time development, from 
some arbitrarily chosen initial state p(0), can be obtained by iteration on 
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Eq. (3) with the result p(t) = T'p(O). Here T* means the transition matrix 
to the t'th power. 

In general, the transfer matrix (or equivalently, the diffusion matrix) will not 
be symmetric. However, T can be related to a symmetric matrix S by the 
following similarity transformation S = KTK 1 with = bijj\fki. Thus, 
T and S will have the same eigenvalue spectrum, and all eigenvalues X^ 
(of T) will be real. It is this eigenvalue spectrum that will control the time- 
development of the diffusive process through A* with Ay = 5ij\( a \ It should 
be noted that since the total number of walkers is conserved at all times, one 
must have —1 < < 1, and that at least one eigenvalue should be one. 1 
We will here adopt the convention and sort the eigenvalues so that a = 1 
corresponds to the largest eigenvalue A^ = 1, A^ to the next to largest 
one, and so on. Physically the principal eigenvalue X^ = 1, corresponds to a 
stationary state where p(oo) oc p^\ the diffusive current flowing from node 
% to node j is exactly balanced by that flowing from j to %. The stationary 
state is unique for single component networks, and is fully determined by 
the connectivities of the network according to Pi(oo) = ki/(J2iki) °t ki. This 
can be easily checked by substituting this relation into Eq. (3). All modes 
corresponding to eigenvalues |A^| ^ 1, are decaying modes since the time 
dependence of p(t) enters through A* and one recalls that |A( Q )| < 1. Notice 
that A^ > represent non-oscillatory modes, while A^ < correspond to 
states where oscillations will take place with time, but this latter possibility 
will not be considered here. 

The large scale topology of a given complex network reflects itself in the sta- 
tistical properties of its diffusion eigenvectors p( a \ One such property is the 
Participation Ratio (PR), that will be defined below. It quantifies the effective 
number of nodes participating in a given eigenvector with a significant weight. 
Since the stationary state of a network depends on the connectivities of its 
nodes, Pi(oo) oc ki, it is convenient to introduce a normalized eigenvector 



C W M = ^. (4) 

Observe that cf^ is nothing but the walker density per link of node i. Hence, 
in effect, represents the outgoing currents flowing from node i, along each 
of its links, toward its neighbors. In the steady state, where Pi(oo) oc ki, it 
follows that these currents are all the same for any link in the network. Hence, 
highly connected nodes are not treated differently from less connected nodes. 
More formally, are the eigenvectors of the transposed transfer matrix 



1 For the diffusion matrix, this means that its spectrum, {A}/}, must satisfy —2 < 
A ( " } < since = - 1. 
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corresponding to the same eigenvalue \^ (of T). Here one alternatively could 
have defined Tt (instead of T) as the transfer matrix from the very beginning. 
Doing so, would have resulted in a master equation of the form (3), but for the 
currents c(t) with corresponding eigenmodes c^ a \ The physical interpretation 
of such an alternative equation is as follows: Instead of walkers, think of a 
signal propagating on the underlying network. The signal at node % at time 
t + 1 is then the average of the signal at the nearest neighbors at time t. 

If the link currents are normalized to unity, i.e. if ||c^||2 = 1 in the L2-norm, 
then the participation ratio (PR) is defined as [11] 



Xa ' 



i=l 



(5) 



For the stationary state, where all the currents are equal to q 1 ^ = 1/ y/~N, one 
has Xi — N where we recall that N is the total number of nodes in the net- 
work. Hence, when q^I, the ratio Xa can be regarded as an effective number 
of nodes participating in the eigenvector with a significant weight [11,12]. 
It should be noted that the participation ratio is a simple and crude measure 
of size. Strictly speaking, for this ratio to be able to say something with con- 
fidence about the size of a given module, the main contribution to Xa should 
come from nodes within that module. For instance, if Xa gets major contribu- 
tions from current elements cf^ of different signs, i.e. from different topological 
structures, it is not trivial to relate Xa to the size of a single module. 

For all types of complex networks where a modular structure is of interest, it 
is important to try to quantify the number of modules. Such information is 
of interest since this number may say something about the organizing prin- 
ciples being active in the generating process of the network. However, the 
measurement of this number is often hampered by statistical uncertainties 
due to temporary (non-robust) topological structures being counted as mod- 
ules. Hence the challenge is to obtain the significant (or robust) number of 
modules that are due to the "rule of creation" and not just happened to be 
there by chance. For this purpose, it is useful to introduce a null model - 
a randomized version of the network at hand — but so that the degree dis- 
tribution of the original network is not changes [13]. A rough estimate of the 
number of different modules contained in a network could be given by the 
number of slowly decaying non-oscillatory modes that have a participation 
ratio, Xa, significantly exceeding the (ensemble) averaged participation ration 
of the corresponding randomized network, Xo°- Hence, it is suggested that a 
module is significant if Xa 

We will close this section by noting that the (outgoing) currents, cf , can also 
be used for vitalization purposes. The basis of this approach is the observation 
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that the outgoing currents on links within a module are almost constant, and 
the size of this constant depends on to which extent the module participates 
in the given eigenmode. The constants thus varies from module to module and 
from eigenmode to eigenmode. Hence, by sorting cf^ by size, nodes belonging 
to the same module will be located close to each. 



3 The data sets 

During a two year period in the mid 1970's, W. Zachary studies the rela- 
tions among 34 members of a university karate club during a period of trou- 
ble [14] . A serious controversy existed between the trainer of the club (node 1 
in Fig. la) and its administrator (node 34). Ultimately this conflict resulted 
in the breakup of the club into two new clubs of roughly the same size. In 
his original study [14], W. Zachary mapped out the strength of friendship 
between the various members. In our study, however, we will be considering 
the unweighted version of his network [15] (see Fig. la). Recently, this same 
network was used by Girvan and Newman [3] in their study of community 
structures. 

The second network that will be considered is a much larger network taken 
from the organization of the Internet. The Internet consists of a large num- 
ber of individual computers that are identified by their so-called IP-address, 
and they are usually grouped together in local area networks (LAN). Such 
computer networks are connected to one another via routers — a complex 
network linking device. In addition to being able to direct pieces of informa- 
tion to its intended destination, a router also has the ability to determine the 
best path to a given destination (routing). This is done by keeping available 
an updated routing table telling the router how to reach certain destinations 
specified by the network administrator. In much the same way as computers 
are being organized into networks, the same is done for routers. If the router 
network becomes large and coherent enough, it may make out what is called 
an Autonomous System (AS). An AS is a connected segment of a network that 
consists of a collection of subnetworks (with hosts attached) interconnected 
by a set of routes. Usually it is required that the subnetworks and the routers 
should be controlled by a coherent organization — like, say, a university, or a 
medium-to-big enterprise. Importantly for the efficiency of the Internet, each 
Autonomous System, identified by a unique AS number, is expected to present 
to the other systems a consistent list of destinations reachable through that 
AS, as well as a coherent interior routing plan. 

The particular network we will be considering is an Autonomous System net- 
work collected by the Oregon Views project on January 3, 2000 [16]. In this 
network the AS will act as nodes, while their routing plans, i.e. with which 



6 



other AS a given one directly shares information, will correspond to the links 
of the network. 



4 Results and discussion 

4-1 The Zachary Friendship Network 

In Sec. 2, it was argued that from the currents corresponding to the slowly 
decaying eigenmodes, one should be able map out the large scale structure 
of the underlying network. Our hypothesis will now be put to the test, and 
we start with the small friendship network due to Zachary [14]. This network 
is depicted in Fig. la, and it is small enough to enable us to see how it all 
comes about. By visual inspection of this figure, it is apparent that there are, 
at least, two large scale clusters — one corresponding to the trainer of the 
karate club (node 1) and his supporters, and one to the administrator (node 
34). In Fig. la, the karate club members (according to Zachary) supporting 
the trainer in the ongoing conflict are marked with squares (16 nodes), while 
the followers of the administrator are marked with open circles (18 nodes). 
Notice that this subdivision was done by Zachary [14], and no sophisticated 
clustering techniques were applied to obtain these results. 

We will now see if the same clustering structure can be obtained by the method 
outlined in Sec. 2 of this paper. Let us start by considering the two most slowly 
decaying modes of the network, namely a = 2 and 3 (with our ordering), cor- 
responding to the eigenvalues A*- 2 -* = 0.87 and A*- 3 - 1 = 0.71, respectively. If one 
sorts the elements of the current vector c^ 2 ) by size, and group them according 
to their signs, one recovers two clusters (see the abscissa of Fig. lb). This 
grouping, or topological structure, fits nicely with the original assignments 
made by Zachary (Fig. la) and indicated by the open squares and circles in 
Fig. lb. Hence it is the trainer-administrator separation of the karate club 
members that is mapped out by the concurrents. Observe that node 3, which 
has an equal number of links to supporters of the trainer and administra- 
tor, has a current that is almost zero. Our identification of this node is the 
only one that differs from that made originally by Zachary. Interestingly, our 
classification, including node 3, fits that previously obtained by Girvan and 
Newman [3] in their hierarchical tree clustering approach. 

If the same procedure was repeated, but for the currents c®, one would map 
out other topological features of the network. In Fig. lb, a c^c^-plot is 
presented. Such a plot groups the nodes along two axis; the c^-axis — cor- 
responding to the trainer-administrator axis — and the c^-axis. The most 
striking feature of Fig. lb is the group of nodes corresponding to < and 
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c^ 3 ) > 0, i.e. to the following group of nodes {5, 6, 7, 11, 17}. In Fig. la, these 
nodes are located in the upper left corner of the graph. They are in addition to 
being connected among themselves, only connected to the rest of the network 
via the trainer. Thus, they represent a sub-cluster, and this is indeed what is 
apparent from Fig. lb. 

The participation ratios for the two slowest decaying diffusive modes were 
found to be X2 = 14.0 and X3 — 13.1, respectively. As can be seen from 
Fig. lb, these participation ratios receive substantial contributions from both 
positive and negative current elements, i.e. from more then one topological 
structure. Hence, we suspect that these numbers being close to the size of 
the administrator and trainer "clan" (of size 18 and 16 nodes, respectively) is 
somewhat accidental. 



4-2 The Autonomous System Network 

We will now focus our attention on the Autonomous System (AS) network 
introduced in the previous Section. This network, consisting of N = 6,474 
nodes and 12, 572 undirected links, is so big that plotting it for the purpose 
of study its modular structure is not a practical option. Motivated by what 
we found for the small Zachary friendship network, we will now go ahead and 
use somewhat similar techniques for its study. 

In Fig. 2 the participation ratio, Xa, of eigenvectors c[°^ (top) and the eigen- 
value density (bottom) are plotted as functions of the corresponding eigen- 
values — 1 < < 1. The data for the Internet, that is an example of a 
Scale- Free Network [17], are displayed together with the data for its random- 
ized counterpart (the null model). From Fig. 2 it is apparent that while the 
density of states is rather similar for these two networks, the participation ra- 
tios of the slowly decaying modes, especially for A*") close to 1, are markedly 
higher in the Internet network than in the accompanying randomized network. 
These differences signal large scale topological structures [12] that are real and 
not accidental. For the most slowly decaying diffusive eigenmode, the partic- 
ipation ratio is % 2 ~ 100 (cf. Fig. 2a). From Fig. 3, that depicts the currents 
of the two slowest decaying diffusive eigenmodes, one observes that the main 

(2) 

contribution to X2 comes from current elements c\ of the same sign. Thus, 
a module has been detected, and X2 roughly measures its size. From, Fig. 3 
one also makes the interesting observation that the main contributing nodes 
to X2 are Autonomous Systems located in Russia (denoted by open squares 
in Fig. 3). Hence, the diffusive mode a = 2 maps out Russia, as a Russian 
module [12]! In total there are 174 Russian nodes in our data set. Moreover, 
from Fig. 3, modules corresponding to countries like the USA, France and 
Korea are also easily identified. The edges of the AS Internet network, defined 
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as the nodes corresponding to the most distant values of the current elements 

(2) 

c\ , are in this case represented by a Russian and a US military site located 
in the South Pacific. These nodes can thus be said to represent the extreme 
edges of the Internet. One has also studies the number of significant modules 
(where Xa 3> Xa D )i an d found that this number for the AS network is roughly 
M = 100 [12]. The number of different nodes participating in one or more of 
these M modules was found to be about Nm ~ 1800. Hence the modularity 
of the network is (at least) N M /N ~ 30%. 



4-3 Generic features of the current- current plots 



One common feature of the current-current plots, Figs, lb and 3, is their line 
or star-like structures. Such structures are in fact rather generic, and we will 
here try to explain why. We have argued earlier that current elements cf 1 ^ 
should almost be the same within a module. Hence, for two nodes i and j 
belonging to one and the same module, the fraction cf^/c^ is expected to 
be a constant unique for that module, but independent of the diffusive mode 
a. For two different (significant) diffusive modes, say a and f3, the fraction 
7 = cf 1 ^ /cf^ (for node i) will therefore as a consequence also be constant. 
Thus, under the assumptions made above, one predicts a straight line in a 
current-current plot (that pases through the origin) [12] : c\°^ = 7 cf \ for 
nodes i belonging to the same module. From Fig. 3, this simple argument 
appears to be valid. That Fig. lb seems to not pass through the origin, but 
still being straight lines, we believe is due to the diffusive modes being excited 
over a non-trivial background in this case. 



4-4 Numerical implementation 



Before presenting the conclusions of this paper, we will add a few closing 
remarks on the numerical implementation of the method. For this work to 
have any relevance for large real- world networks, a fast, memory saving, and 
optimized algorithm for the calculation of the largest eigenvalues and the 
corresponding eigenvectors of a sparse matrix is required. Fortunately, such 
an algorithm has already been implemented and made available e.g. through 
the TRLan software package [18]. This software is optimized for handling large 
problems, and it can run on large scale parallel supercomputers. 
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5 Conclusions 



We have generalized the normal diffusion process to diffusion on (discreet) 
complex networks. By considering such a process, it has been demonstrated 
that topological properties — like modular structures and edges — of the 
underlying network can be probed. This is achieved by focusing on the slowest 
decaying eigenmodes of the network. The use of the procedure was exemplified 
by considering a small friendship network, with known modular structure, as 
well as a routing network of the Internet, where the structure was not so well 
known. For the friendship network the known structure was well reproduced, 
and the Internet was indeed found to be modular. The detected modules 
of the Internet were consistent with the geographical location of the nodes, 
and the individual modules corresponded roughly to the national structure. 
Interestingly, it was observed that a political subdivision of the Internet was 
also one of the predictions of the algorithm presented in this paper; The two 
most poorly connected nodes of the Internet (extreme edges), where found 
to be represented by a Russian and a US military site located in the South 
Pacific. 
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Fig. 1. (a) Zachary's friendship network [14] of the "troubled" karate club consisting 
of N = 34 nodes and L = 78 links (Figure after Ref. [3]). Here open squares and 
circles are used to denote the supporters, in the ongoing conflict, of the trainer 
(node 1) and administrator (node 34), respectively, (b) The c^c® -plot maps out 
the large scale topology of the network. The dashed lines indicate the lines of zero 
currents. 
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Fig. 2. The participation ratio Xa (top, A) and the eigenvalue density (bottom, B) 
as a function of the eigenvalue of the transfer matrix, — 1 < X^ < 1, measured in 
the Internet (filled circles) and in its randomized counterpart (open squares) — a 
Random Scale-Free Network. The participation ratio was averaged over A-bins of 
size 0.05 excluding eigenmodes = 0, and A^ = 1 (cf. Ref. [12]). 
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Fig. 3. The Internet clustering: The coordinates of the i-th AS in this plot are 

(2) (3) 

its current components (c\ ,c\ ) of the two slowest decaying non-oscillatory dif- 
fusion modes. The symbols reveals the geographical location of the AS: Russia - 
squares, France - circles, USA - crosses, Korea - triangles). Note the straight lines 
corresponding to good country- modules. 
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